Federated learning has become a popular machine learning paradigm with many potential real-life applications, including recommendation systems, the Internet of Things (IoT), healthcare, and self-driving cars. Though most current applications focus on classification-based tasks, learning personalized generative models remains largely unexplored, and their benefits in the heterogeneous setting still need to be better understood. This work proposes a novel architecture combining global client-agnostic and local client-specific generative models. We show that using standard techniques for training federated models, our proposed model achieves privacy and personalization that is achieved by implicitly disentangling the globally-consistent representation (i.e. content) from the client-dependent variations (i.e. style). Using such decomposition, personalized models can generate locally unseen labels while preserving the given style of the client and can predict the labels for all clients with high accuracy by training a simple linear classifier on the global content features. Furthermore, disentanglement enables other essential applications, such as data anonymization, by sharing only content. Extensive experimental evaluation corroborates our findings, and we also provide partial theoretical justifications for the proposed approach.
translated by 谷歌翻译
在这项工作中,我们提出了新的自适应步长策略,以改善几种随机梯度方法。我们的第一种方法(停止)基于经典的Polyak步长(Polyak,1987),是随机优化SPS(Loizou等,2021)的最新开发的延伸,我们的第二种方法,以及我们的第二种方法表示毕业生,通过“随机梯度的多样性”重新缩放步长。我们对这些方法进行了理论分析,以实现强烈凸平的光滑功能,并表明尽管随机梯度随机梯度,它们仍享有确定性的速率。此外,我们证明了自适应方法对二次目标的理论优势。不幸的是,两个停止和毕业生都取决于未知数量,这仅适用于过度散光模型。为了解决这个问题,我们放弃了这种不希望的依赖性,并重新定义了停止和毕业生的停止和毕业。我们表明,这些新方法在相同的假设下线性收敛到最佳解决方案的邻域。最后,我们通过实验验证来证实我们的理论主张,这表明GRAD对于深度学习优化特别有用。
translated by 谷歌翻译
Federated Learning(FL)是一种新兴的机器学习范式,涉及多个客户,例如手机设备,并激励了协作解决由中央服务器协调的机器学习问题。FL是由Kone \ V {C} N \'{Y}等人提出的。和McMahan等。作为传统集中机器学习的可行保护替代方案,因为通过构造,培训数据点被分散,并且从未由客户转移到中央服务器上。因此,在一定程度上,FL会减轻与集中数据收集相关的隐私风险。不幸的是,FL面对FL的优化通常不需要处理的几个特定问题。在本论文中,我们确定了其中一些挑战,并提出了新的方法和算法来解决这些挑战,最终目标是实现由数学上严格的保证支持的实用FL解决方案。
translated by 谷歌翻译
从经验上证明,在跨客户聚集之前应用多个本地更新的实践是克服联合学习(FL)中的通信瓶颈的成功方法。在这项工作中,我们提出了一种通用食谱,即FedShuffle,可以更好地利用FL中的本地更新,尤其是在异质性方面。与许多先前的作品不同,FedShuffle在每个设备的更新数量上没有任何统一性。我们的FedShuffle食谱包括四种简单的功能成分:1)数据的本地改组,2)调整本地学习率,3)更新加权,4)减少动量方差(Cutkosky and Orabona,2019年)。我们对FedShuffle进行了全面的理论分析,并表明从理论和经验上讲,我们的方法都不遭受FL方法中存在的目标功能不匹配的障碍,这些方法假设在异质FL设置中,例如FedAvg(McMahan等人,McMahan等, 2017)。此外,通过将上面的成分结合起来,FedShuffle在Fednova上改善(Wang等,2020),以前提议解决此不匹配。我们还表明,在Hessian相似性假设下,通过降低动量方差的FedShuffle可以改善非本地方法。最后,通过对合成和现实世界数据集的实验,我们说明了FedShuffle中使用的四种成分中的每种如何有助于改善FL中局部更新的使用。
translated by 谷歌翻译
联合学习(FL)已成为边缘设备的一种有前途的技术,可以协作学习共享的机器学习模型,同时将培训数据保留在设备上,从而消除了在云中存储和访问完整数据的需求。但是,考虑到公共边缘设备设置中的异质性,FL很难实施,测试和部署在实践中,从而使研究人员从根本上难以有效原型和测试其优化算法。在这项工作中,我们的目的是通过引入FL_PYTORCH:用Python编写的一套开源软件来减轻此问题,该软件以最受欢迎的研究深度学习(DL)框架Pytorch为基础。我们构建了FL_PYTORCH作为FL的研究模拟器,以实现快速开发,原型制作和实验新的和现有的FL优化算法。我们的系统支持摘要,为研究人员提供足够的灵活性,以实验现有和新颖的方法以推进最先进的方法。此外,FL_PYTORCH是一个易于使用的控制台系统,允许使用本地CPU或GPU同时运行多个客户端,甚至可以远程计算设备,而无需用户提供的任何分布式实现。 FL_PYTORCH还提供图形用户界面。对于新方法,研究人员仅提供其算法的集中实施。为了展示系统的可能性和实用性,我们尝试了几种著名的最先进的FL算法和一些最常见的FL数据集。
translated by 谷歌翻译
联邦学习(FL)是一种越来越受欢迎的机器学习范式,其中多个节点在隐私,通信和多个异质性约束下尝试协同学习。联邦学习中的持续存在问题是,不清楚优化目标应该:监督学习的标准平均风险最小化在处理联合学习的几个主要限制方面是不充分的,例如沟通适应性和个性化控制。我们在联合学习的框架中识别几个关键的Desiderata,并介绍了一个新的框架,Flix,考虑到联合学习所带来的独特挑战。 Flix具有标准的有限和形式,使从业者能够利用分布式优化的现有(潜在非本地)方法的巨大财富。通过不需要任何通信的智能初始化,Flix不需要使用本地步骤,但仍然可以通过本地方法执行不一致的正则化。我们提供了几种用于在通信约束下有效解决FLIX制剂的算法。最后,我们通过广泛的实验证实了我们的理论结果。
translated by 谷歌翻译
在过去的几年中,各种通信压缩技术已经出现为一个不可或缺的工具,有助于缓解分布式学习中的通信瓶颈。然而,尽管{\ em偏见}压缩机经常在实践中显示出卓越的性能,但与更多的研究和理解的{\ EM无偏见}压缩机相比,非常少见。在这项工作中,我们研究了三类偏置压缩操作员,其中两个是新的,并且它们在施加到(随机)梯度下降和分布(随机)梯度下降时的性能。我们首次展示偏置压缩机可以在单个节点和分布式设置中导致线性收敛速率。我们证明了具有错误反馈机制的分布式压缩SGD方法,享受ergodic速率$ \ mathcal {o} \ left(\ delta l \ exp [ - \ frac {\ mu k} {\ delta l}] + \ frac {(c + \ delta d)} {k \ mu} \右)$,其中$ \ delta \ ge1 $是一个压缩参数,它在应用更多压缩时增长,$ l $和$ \ mu $是平滑性和强凸常数,$ C $捕获随机渐变噪声(如果在每个节点上计算完整渐变,则$ C = 0 $如果在每个节点上计算),则$ D $以最佳($ d = 0 $ for over参数化模型)捕获渐变的方差)。此外,通过对若干合成和经验的通信梯度分布的理论研究,我们阐明了为什么和通过多少偏置压缩机优于其无偏的变体。最后,我们提出了几种具有有希望理论担保和实际表现的新型偏置压缩机。
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
translated by 谷歌翻译
Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.
translated by 谷歌翻译